Toward a Robust Crowd-labeling Framework using Expert Evaluation and Pairwise Comparison
نویسندگان
چکیده
Crowd-labeling emerged from the need to label large-scale and complex data, a tedious, expensive, and time-consuming task. One of the main challenges in the crowd-labeling task is to control for or determine in advance the proportion of low-quality/malicious labelers. If that proportion grows too high, there is often a phase transition leading to a steep, non-linear drop in labeling accuracy as noted by Karger et al. [2014]. To address these challenges, we propose a new framework called Expert Label Injected Crowd Estimation (ELICE) and extend it to different versions and variants that delay phase transition leading to a better labeling accuracy. ELICE automatically combines and boosts bulk crowd labels supported by labels from experts for limited number of instances from the dataset. The expert-labels help to estimate the individual ability of crowd labelers and difficulty of each instance, both of which are used to aggregate the labels. Empirical evaluation shows the superiority of ELICE as compared to other state-of-the-art methods. We also derive a lower bound on the number of expert-labeled instances needed to estimate the crowd ability and dataset difficulty as well as to get better quality labels.
منابع مشابه
Toward a Robust and Universal Crowd-Labeling Framework
One of the main challenges in crowd-labeling is to control for or determine in advance the proportion of low-quality/malicious labelers. We propose methods that estimate the labeler and data instance related parameters using frequentist and Bayesian approaches. All these approaches are based on expert-labeled instance (ground truth) for a small percentage of data to learn the parameters. We als...
متن کاملQuality Control of Crowd Labeling through Expert Evaluation
We propose a general scheme for quality-controlled labeling of large-scale data using multiple labels from the crowd and a “few” ground truth labels from an expert of the field. Expert-labeled instances are used to assign weights to the expertise of each crowd labeler and to the difficulty of each instance. Ground truth labels for all instances are then approximated through those weights along ...
متن کاملRobust Crowd Labeling Using Little Expertise
Crowd-labeling emerged from the need to label large-scale and complex data, a tedious, expensive, and time-consuming task. But the problem of obtaining good quality labels from a crowd and their integration is still unresolved. To address this challenge, we propose a new framework that automatically combines and boosts bulk crowd labels supported by limited number of “ground truth” labels from ...
متن کاملDetermination of weight vector by using a pairwise comparison matrix based on DEA and Shannon entropy
The relation between the analytic hierarchy process (AHP) and data envelopment analysis (DEA) is a topic of interest to researchers in this branch of applied mathematics. In this paper, we propose a linear programming model that generates a weight (priority) vector from a pairwise comparison matrix. In this method, which is referred to as the E-DEAHP method, we consider each row of the pairwise...
متن کاملDesigning the Integrated Framework of Strategic Planning and Policy Making in Upstream Oil and Gas Drilling Sector
The aim of this study is designing the integrated framework of strategic planning and policy making in upstream oil and gas drilling sector. In this regard variety of robust strategies were designed using SWOT matrix and in order to weighting and prioritizeing decision options, all effective factors and parameters were extracted and explained using Delphi technique and pairwise comp...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1607.02174 شماره
صفحات -
تاریخ انتشار 2016